Skip to content

BUG: Fix __ne__ comparison for Categorical #32304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 3, 2020
Merged

BUG: Fix __ne__ comparison for Categorical #32304

merged 8 commits into from
Mar 3, 2020

Conversation

dsaxton
Copy link
Member

@dsaxton dsaxton commented Feb 27, 2020

@@ -103,7 +103,10 @@ def func(self, other):
mask = (self._codes == -1) | (other_codes == -1)
if mask.any():
# In other series, the leads to False, so do that here too
ret[mask] = False
if opname == "__ne__":
ret[mask & (self._codes == other_codes)] = True
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This redundancy was intentional and an attempt to "reuse" the mask calculation above since I think it may short-circuit when mask is False, but could just make it (self._codes == -1) & (other_codes == -1) as well. I also think __ne__ is the only situation that'd have to be special-cased here out of the comparison operators. Also worth pointing out that this is assuming we're dealing with NaN rather than NA.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(self._codes == -1) & (other_codes == -1)

I think this would be more clear. Also there might be a cached _isnan attribute for this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsaxton im working on cleaning up these comparisons, am confused by this line. can we make this use the more common pattern

fill_value = True if op is operator.ne else False
[...]

ret = op(self._codes, other_codes)
mask = (self._codes == -1) | (other_codes == -1)
ret[mask] = fill_value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel That is much simpler and seems correct to me

@@ -282,6 +282,16 @@ def _compare_other(self, s, data, op_name, other):
with pytest.raises(TypeError, match=msg):
op(data, other)

def test_not_equal_with_na(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we get this right in cases with e.g. datetime64 or datetime64tz categories?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to work, added some parameterization over other category types

@jreback jreback added Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 3, 2020
@jreback jreback added this to the 1.1 milestone Mar 3, 2020
@jreback jreback merged commit 821aa25 into pandas-dev:master Mar 3, 2020
@jreback
Copy link
Contributor

jreback commented Mar 3, 2020

thanks @dsaxton

@dsaxton dsaxton deleted the noteq-cat branch April 9, 2020 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Categorical NaN behaviour different from a str
3 participants